Goto

Collaborating Authors

 data analysis


Continuous-Time Learning of Probability Distributions: A Case Study in a Digital Trial of Young Children with Type 1 Diabetes

Álvarez-López, Antonio, Matabuena, Marcos

arXiv.org Machine Learning

Understanding how biomarker distributions evolve over time is a central challenge in digital health and chronic disease monitoring. In diabetes, changes in the distribution of glucose measurements can reveal patterns of disease progression and treatment response that conventional summary measures miss. Motivated by a 26-week clinical trial comparing the closed-loop insulin delivery system t:slim X2 with standard therapy in children with type 1 diabetes, we propose a probabilistic framework to model the continuous-time evolution of time-indexed distributions using continuous glucose monitoring data (CGM) collected every five minutes. We represent the glucose distribution as a Gaussian mixture, with time-varying mixture weights governed by a neural ODE. We estimate the model parameter using a distribution-matching criterion based on the maximum mean discrepancy. The resulting framework is interpretable, computationally efficient, and sensitive to subtle temporal distributional changes. Applied to CGM trial data, the method detects treatment-related improvements in glucose dynamics that are difficult to capture with traditional analytical approaches.





MultiparameterPersistenceImagesforTopological MachineLearning

Neural Information Processing Systems

However,in manyapplications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representationoftheresults.


ScatteringGCN: OvercomingOversmoothnessin GraphConvolutionalNetworks-Supplement

Neural Information Processing Systems

Now,since|N(v)|=β,itholds (Px)[v]= a+b 2, thus verifying the first claim of the lemma as the choice ofv was arbitrary. This construction essentially generalizes the graph demonstrated in Figure 1 of the main paper (see Sec. 7). The following lemma shows that onsuch graphs, the filter responses ofgθ for aconstant signal will encode some geometric information, butwill not distinguish between the cycles inthe graph. These responses with appropriate color coding give the illustration in Figure 1 in the main paper. Validation & testing procedure: All tests were done using train-validation-test splits of the datasets, where validation accuracy is used for tuning hyperparameters and test accuracy is reportedinthecomparisontable.




Hierarchical topological clustering

Carpio, Ana, Duro, Gema

arXiv.org Machine Learning

Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be implemented with any distance choice. The persistence of outliers and clusters of arbitrary shape is inferred from the resulting hierarchy. We demonstrate the potential of the algorithm on selected datasets in which outliers play relevant roles, consisting of images, medical and economic data. These methods can provide meaningful clusters in situations in which other techniques fail to do so.


NanoBaseLib: A Multi-Task Benchmark Dataset for Nanopore Sequencing

Neural Information Processing Systems

Nanopore sequencing is the third-generation sequencing technology with capabilities of generating long-read sequences and directly measuring modifications on DNA/RNA molecules, which makes it ideal for biological applications such as human Telomere-to-Telomere (T2T) genome assembly, Ebola virus surveillance and COVID-19 mRNA vaccine development. However, accuracies of computational methods in various tasks of Nanopore sequencing data analysis are far from satisfactory. For instance, the base calling accuracy of Nanopore RNA sequencing is $\sim$90\%, while the aim is $\sim$99.9\%. This highlights an urgent need of contributions from the machine learning community. A bottleneck that prevents machine learning researchers from entering this field is the lack of a large integrated benchmark dataset.